PCA on face data
library(dplyr)
library(R.matlab)
library(imager)Data
Load the example dataset (from ex7faces.mat which is a
matlab file) which containsa large set of example faces, each 32px by
32px (1024 pixels total per face), and saved in the grayscale format.
x <- readMat("ex7faces.mat")
x %>% str()
List of 1
$ X: num [1:5000, 1:1024] -37.87 8.13 -32.87 -84.87 2.13 ...
- attr(*, "header")=List of 3
..$ description: chr "MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Mon Nov 14 23:46:35 2011 "
..$ version : chr "5"
..$ endian : chr "little"
x <- data.frame(x)
x %>% dim
[1] 5000 1024
x[1:6, 1:6]
X.1 X.2 X.3 X.4 X.5 X.6
1 -37.866314 -45.8663139 -53.866314 -51.866314 -40.866314 -33.86631
2 8.133686 -0.8663139 -8.866314 -15.866314 -17.866314 -16.86631
3 -32.866314 -34.8663139 -36.866314 -18.866314 6.133686 15.13369
4 -84.866314 -64.8663139 -47.866314 -42.866314 -38.866314 -28.86631
5 2.133686 6.1336861 5.133686 9.133686 10.133686 11.13369
6 60.133686 58.1336861 60.133686 59.133686 56.133686 41.13369Visualize the example dataset (display the first 100 faces).
n <- nrow(x)
p <- ncol(x)
npix <- sqrt(p)
v <- 100
ll <- vector("list", v)
# (x[1,] %>% as.numeric()) %>% matrix(npix, npix) %>% head
x[1, ] %>% as.numeric() %>% matrix(npix, npix) %>% apply(2, range) %>% range
[1] -123.86631 75.13369
x[1, ] %>% as.numeric() %>% matrix(npix, npix) %>% apply(1, range) %>% range
[1] -123.86631 75.13369
as.cimg(x[1, ] %>% as.numeric(), x = npix, y = npix)
Image. Width: 32 pix Height: 32 pix Depth: 1 Colour channels: 1
for (i in 1:v) {
ll[[i]] <- as.cimg(x[i, ] %>% as.numeric, x = npix, y = npix)
ll[[i]][, , 1, 1] <- t(ll[[i]][, , 1, 1])
}
par(mfrow = c(sqrt(v), sqrt(v)), mar = c(0, 0, 0.5, 0), bg = "darkslategray")
for (i in 1:v) plot(ll[[i]], axes = F)PCA
- Evaluate whether a PCA can be an affective method for a dimensionality reduction. In case, run appropriately a PCA and visualize the eigenvectors which are in this case eigenfaces.
x %>% ...
x %>% colMeans() %>% summary
x %>% var %>% diag %>% sqrt %>% summary
pca <- ... (... ... ...)- Which is the dimension of the eigenfaces matrix?
pca$...[1] 1024 1024
- Then, we can visualize the eigenfaces as images (let display the first ).
m <- 36
eigenfaces <- ...
...
...
...
...
...- How do you interpret them?
- Why do they appear as ghost-like faces?
- Do you recognize block-type and difference-type eigenfaces?
Dimension reduction
- What dimension should a lower dimensional vector subspace have in order to still accurately represent the original images?
Some useful tools are shown below.
PC2 PC3
Standard deviation 481.22781 334.67523
Proportion of Variance 0.13705 0.06629
Cumulative Proportion 0.43980 0.50609
PC18 PC19
Standard deviation 106.96036 106.32226
Proportion of Variance 0.00677 0.00669
Cumulative Proportion 0.74998 0.75667
PC67 PC68
Standard deviation 48.88754 48.10736
Proportion of Variance 0.00141 0.00137
Cumulative Proportion 0.89903 0.90040
PC90 PC91
Standard deviation 39.17799 38.64754
Proportion of Variance 0.00091 0.00088
Cumulative Proportion 0.92467 0.92555
PC128 PC129
Standard deviation 28.99461 28.71160
Proportion of Variance 0.00050 0.00049
Cumulative Proportion 0.94996 0.95045
[1] 1650.134
PC85 PC86
Standard deviation 40.95566 40.17880
Proportion of Variance 0.00099 0.00096
Cumulative Proportion 0.92000 0.92095
Rebuilding faces
Now that we have the eigenfaces, we can project our original faces onto a subset of of them, thus reducing each image from -dimensions down to a vector of -dimensions.
Let’s project our data onto eigenfaces (pca$x exists if you used prcomp above, otherwise you can easily compute z scores by a matrix product),
k <- 100
z <- pca$x[, 1:k]and then rebuild the original faces using the eigenfaces and encodings for each image,
x_approx <- z %*% ... + ...
x_approx %>% dim[1] 5000 1024
resulting in the images below (consider just the first faces for a comparison with the original ones above):
By using less than 10/% of the original dimensions we are able to reconstruct pretty well the original pictures.